Multiclass Continuous Correspondence Learning

نویسندگان

  • Brian D. Bue
  • David R. Thompson
چکیده

We extend the Structural Correspondence Learning (SCL) domain adaptation algorithm of Blitzer et al. [4] to the realm of continuous signals. Given a set of labeled examples belonging to a “source” domain, we select a set of unlabeled examples in a related “target” domain that play similar roles in both domains. Using these “pivot samples,” we map both domains into a common feature space, allowing us to adapt a classifier trained on source examples to classify target examples. We show that when between-class distances are relatively preserved across domains, we can automatically select target pivots to bring the domains into correspondence. 1 Structural Correspondence Learning for Continuous Feature Spaces We extend the Structural Correspondence Learning (SCL) algorithm of Blitzer et al. [4] to continuous signals. SCL is a domain adaptation technique which creates a mapping between a “source” domain consisting of labeled examples, and an unlabeled “target” domain using a set of “pivot features” common to both domains. In text classification scenarios, these consist of terms (words) that serve similar roles in both domains, so that the role of other features can be inferred by correlation. We extend this concept to continuous domains where the objects we classify are continuous-valued functions, making SCL applicable to data such as time series or electromagnetic spectral signatures. Recent work by Balcan et al. [1] provides an elegant method to define a correspondence mapping between continuous feature spaces. They illustrated that designing a good feature space is similar to designing a good kernel function, and under certain conditions, a kernel which approximately preserves the margin of a max-margin separator can be constructed using a set of unlabeled samples. By projecting samples into a space defined by (distances to) the unlabeled samples, one can potentially harness the power of a high-dimensional kernel mapping in this lower-dimensional feature space. In a similar vein, we define our correspondence mapping using distances to canonical samples, or pivot samples. These distances become the pivot features we use to reconcile differences between the source and target domains. Determining a mapping between domains is closely related to the topic of manifold alignment. Most manifold alignment algorithms assume knowledge of the target domain in the form of paired (source to target) correspondences [11], [13] or a number of labeled target examples [8], to define a transformation that reconciles the feature spaces, but recent work (e.g., [12]) determines the correspondence mapping automatically by matching local geometric properties across feature spaces. This work presents Multiclass Continuous Correspondence Learning (MCCL): a domain adaptation technique for high-dimensional continuous data. In previous work [5], [6], we demonstrated the feasibility of a similar domain adaptation technique for continous data – specifically, hyperspectral imagery. In this work, we show that by exploiting structured relationships between a diverse set of source classes, we can automatically select a set of pivot samples to reconcile differences between source and target domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised learning on closed set lattices

We propose a new approach for semi-supervised learning using closed set lattices, which have been recently used for frequent pattern mining within the framework of the data analysis technique of Formal Concept Analysis (FCA). We present a learning algorithm, called SELF (SEmi-supervised Learning via FCA), which performs as a multiclass classifier and a label ranker for mixed-type data containin...

متن کامل

Output Coding Methods: Review and Experimental Comparison

Classification is one of the ubiquitous problems in Artificial Intelligence. It is present in almost any application where Machine Learning is used. That is the reason why it is one of the Machine Learning issues that has received more research attention from the first works in the field. The intuitive statement of the problem is simple, depending on our application we define a number of differ...

متن کامل

Efficient Multiclass Boosting Classification with Active Learning

We propose a novel multiclass classification algorithm Gentle Adaptive Multiclass Boosting Learning (GAMBLE). The algorithm naturally extends the two class Gentle AdaBoost algorithm to multiclass classification by using the multiclass exponential loss and the multiclass response encoding scheme. Unlike other multiclass algorithms which reduce the K-class classification task to K binary classifi...

متن کامل

Universum Learning for Multiclass SVM

We introduce Universum learning [1], [2] for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose a span bound for MU-SVM that can be used for model selection thereby avoiding resampling. Empirical results demonstrate the effectiveness of MU-SVM and the proposed bound.

متن کامل

Solving Multiclass Learning Problems viaError - Correcting Output

Multiclass learning problems involve nding a deenition for an unknown function f (x) whose range is a discrete set containing k > 2 values (i.e., k \classes"). The deenition is acquired by studying collections of training examples of the form hx i ; f (x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011